Bayesian Language Modelling of German Compounds

نویسندگان

  • Jan A. Botha
  • Chris Dyer
  • Phil Blunsom
چکیده

In this work we address the challenge of augmenting n-gram language models according to prior linguistic intuitions. We argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In our empirical evaluation the model outperforms a modified Kneser-Ney n-gram model in test set perplexity. When used as part of a translation system, the proposed language model matches the baseline BLEU score for English→German while improving the precision with which compounds are output. We find that an approximate inference technique inspired by the Bayesian interpretation of Kneser-Ney smoothing (Teh, 2006) offers a way to drastically reduce model training time with negligible impact on translation quality. TITLE AND ABSTRACT IN AFRIKAANS Bayes-modellering van saamgestelde woorde in Duits Hierdie werk neem uitdagings rondom die uitbreiding van n-gramtaalmodelle volgens voorafgaande linguistieke intuïsie onder die loep. Ons voer aan dat die familie van hiërargiese Pitman-Yor taalmodelle ’n wenslike stuk gereedskap is om hierdie probleem mee aan te pak en formuleer ’n model van Duitse saamgestelde woorde om die benadering te demonstreer. Met behulp van ’n empiriese evaluering bevind ons dat die model in terme van toetsdataperpleksiteit beter vaar as die aangepaste Kneser-Ney n-grammodel. As onderdeel van ’n Engels→Duitsvertalingstelsel behaal die model in terme van die BLEU-metriek dieselfde vertaalafvoerkwaliteit as die kontrole stelsel en genereer saamgestelde woorde teen ’n hoër presisie. Verder stel ons vas dat ’n benaderde inferensietegniek, geïnspireer deur die Bayes-interpretasie van Kneser-Ney-gladstryking (Teh, 2006), gebruik kan word om die modelberamingtyd drasties te verminder sonder wesenlike impak op die vertaalafvoerkwaliteit.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Bayesian Language Modelling for the Linguistically Informed

In this work I address the challenge of augmenting n-gram language models according to prior linguistic intuitions. I argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In an empirical evaluation, the model outperforms the Kneser-Ney model in terms...

متن کامل

Willingness to Communicate (WTC) among Beginning-level German Learners: Teaching German as a Foreign Language in a U.S. University Classroom

This action research examines the concept of Willingness to Communicate (WTC) in a second language acquisition context. The researcher investigated the contributors of WTC in a foreign language classroom setting. Therefore, a multiple assignments method and sequence was applied. Participants of this study were students who matriculated in a United States (U.S.) undergraduate program, studying G...

متن کامل

Willingness to Communicate (WTC) among Beginning-level German Learners: Teaching German as a Foreign Language in a U.S. University Classroom

This action research examines the concept of Willingness to Communicate (WTC) in a second language acquisition context. The researcher investigated the contributors of WTC in a foreign language classroom setting. Therefore, a multiple assignments method and sequence was applied. Participants of this study were students who matriculated in a United States (U.S.) undergraduate program, studying G...

متن کامل

Bayesian perspective over time

Thomas Bayes, the founder of Bayesian vision, entered the University of Edinburgh in 1719 to study logic and theology. Returning in 1722, he worked with his father in a small church. He also was a mathematician and in 1740 he made a novel discovery which he never published, but his friend Richard Price found it in his notes after his death in 1761, reedited it and published it. But until L...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012